Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow searching for similar images #21

Merged
merged 18 commits into from
Nov 7, 2021

Conversation

ramayer
Copy link
Contributor

@ramayer ramayer commented Oct 6, 2021

Still a work in progress for #20 . Not yet ready for merging; but creating the pull request early for comments on the patches.

A couple open issues/questions:

  • What user agent string (if any) should rclip use when it makes http(s) requests? I think it should have a user agent, because some websites like Wikimedia Commons are pretty picky and block requests without one ( https://meta.wikimedia.org/wiki/User-Agent_policy ). I just put in a temporary placeholder for now, mostly so wikimedia.org doesn't complain when I test with things like: ./bin/rclip.sh -f https://upload.wikimedia.org/wikipedia/commons/c/c1/Variegated_golden_frog_%28Mantella_baroni%29_Ranomafana.jpg

  • So far this has only been minimally tested. I've been using this:
    ./bin/rclip.sh ./tmp/d2/img0015.jpg
    looking for a score of 100% from the identical picture, and this
    ./bin/rclip.sh -t 999 "" --sub "" --sub ./tmp/d2/img0015.jpg | tail
    looking at a score of -100% to make sure that adding and subtracting is working; but I haven't really tried many combinations yet. I haven't tried on Windows at all yet.

  • This commit undid the optimization from the previous pull request of calculating many text vectors in a single statement. Not sure it's important. If so, I suppose it could use itertools.groupby to group the user's inputs into text vs image requests; and at least process all the text ones in a single group.

@yurijmikhalevich
Copy link
Owner

Hi, @ramayer. Thank you for opening this.

What user agent string (if any) should rclip use when it makes http(s) requests?

I'm cool with sticking to the one Wikimedia recommends for now and returning to this question when/if something will break.

looking at a score of -100% to make sure that adding and subtracting is working; but I haven't really tried many combinations yet. I haven't tried on Windows at all yet.

Good. Thank you for letting me know. I suggest also checking that regular queries perform exactly as before.

This commit undid the optimization from the previous pull request of calculating many text vectors in a single statement. Not sure it's important.

It would've been interesting to see the performance difference. If you'll be benchmarking it, please, include the call to compute_text_features in the benchmark code.

If so, I suppose it could use itertools.groupby to group the user's inputs into text vs image requests; and at least process all the text ones in a single group.

I like this idea. We may also process all of the pictures (if multiple were passed) in a single batch.

rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
@ramayer
Copy link
Contributor Author

ramayer commented Oct 7, 2021

Thx for the feedback. I agree with everything; but probably won't get around to it until the weekend.

@ramayer
Copy link
Contributor Author

ramayer commented Oct 13, 2021

Sry for the delay - I got distracted (actually on different aspects of rclip --- I was attempting to fine-tune CLIP to recognize my particular dog instead of all dogs in general - but didn't get too far) . Now estimating early this coming weekend.

@yurijmikhalevich
Copy link
Owner

@ramayer, no worries. Fine-tuning sounds fun. Thank you :)

@ramayer
Copy link
Contributor Author

ramayer commented Oct 17, 2021

I pushed an updated version with much of the above feedback.

  • phrase and image parameters are now grouped so multiple embeddings are calculated in groups instead of one-at-a-time.
  • download size limits and timeouts added
  • updated requirements.txt

Right now it's still in a number of separate commits; I'll squash them once it's getting close.

rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated
Comment on lines 52 to 55
def image_from_file(self, query: str) -> Image.Image:
path = query.removeprefix('file://')
img = Image.open(path)
return img
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think str.removeprefix might be a python 3.9 dependency; and the Pipfile calls for 3.8. I could replace it with a re.sub() expression if 3.8 support's needed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramayer, sounds good. Thank you!

rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
@yurijmikhalevich
Copy link
Owner

@ramayer, thank you very much for the update. I indicated a few areas for improvement in the comment.

rclip/utils.py Outdated Show resolved Hide resolved
rclip/utils.py Outdated Show resolved Hide resolved
@@ -8,6 +8,7 @@ clip = {git = "https://github.com/openai/CLIP.git"}
torch = "==1.9.0+cpu"
torchvision = "==0.10.0+cpu"
pillow = ">=8.3.2"
requests = "~=2.26"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it via pipenv install "requests~=2.26" instead of editing the Pipfile?

This will also update Pipfile.lock and will make the tests pass. I can do it if you don't want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling to get this to resolve without errors.

PS -- wondering what direction you want to take this project in the future. On my home machines it's now the main way I've been organizing my pictures; and I've hacked on a lot of features that might-or-might-not be out of scope for your vision for this project - including:

  • De-dup --- Embeddings are cool - they can even find duplicate thumbnails of different sizes. I added a script that finds pictures with identical (or near identical) embeddings, and flag them as a duplicate in a column I added to SQLite.
  • Support for explicit captions/tags/picking up words from folders --- for some images I already had them tagged with words that CLIP couldn't figure out. For example, folders of country names I've visited. I added a column for the CLIP embedding of the words I've manually tagged; and optionally also search that column. Without it, a search for 'germany' only returns about 1/10 of my pictures from germany.
  • Face embeddings -- so I can search for a specific person. I'm using https://github.com/serengil/deepface to add a column for faces. Currently I can't input the name of a face - but using the "similar to an existing image" I can find them. Combining this with the "explicit tagging" feature above I'll soon be able to search for people by name (as long as I have tagged them at least once)

No idea if you consider those in or our of scope for the main project.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramayer, thanks for the update. I'll take a look at it.

About the direction:

  1. I would love to keep the rclip as a small and simple CLI tool, but considering building a more advanced app on top of it.
  2. I like the de-dup idea and think that it can be a part of the rclip CLI tool. But, I am not 100% sure about this one yet, so instead of filing a GHI, I created a Discussion here: Should we and how to add a de-dup functionality to rclip? #23.
  3. This is a must-have feature for a practical photo search tool, but implementing it will fundamentally change how rclip works. Also, if we will implement this in rclip it will start output pictures that don't feature anything German in them (if though they were taken in Germany; for example, a picture of a generic bed made in the hotel). I feel like this feature can be a part of another tool that will incorporate rclip. Or I should change how I think about rclip 😅 Let me take to some time to consider this. I also think that we may consider extracting manual/existing tags from the images themselves.
  4. Face embeddings feature sounds great, and my general attitude to it is more open compared to the one I have towards indexing the folder names.

Thank you for sharing all these. All good ideas that I need to consider.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramayer, pipenv install "requests~=2.26" worked fine for me. But, you shouldn't forget to execute pipenv shell before doing anything with pipenv. I mention it here, in the Contributing section: https://github.com/yurijmikhalevich/rclip#contributing. Executing pipenv shell sets env variable from .env which is required to install the PyTorch with pipenv from the official PyTorch "repository."

@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Oct 25, 2021

Thank you very much for the update. I've added a few comments. Once they are addressed, we should be good to merge.

@yurijmikhalevich yurijmikhalevich merged commit b8298c5 into yurijmikhalevich:main Nov 7, 2021
@yurijmikhalevich
Copy link
Owner

Updated and merged. Thank you!

@yurijmikhalevich
Copy link
Owner

@all-contributors please add @ramayer for code

@allcontributors
Copy link
Contributor

@yurijmikhalevich

I've put up a pull request to add @ramayer! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants